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Abstract 

The Internet-based encyclopaedia Wikipedia has grown to become one of the 
most visited web-sites on the Internet. However, critics have questioned the quality 
of entries^i^, and an empirical study has shown Wikipedia to contain errors in a 
2005 sample of science entries'^. Biased coverage and lack of sources are among 
the "Wikipedia risks" ^. The present work describes a simple assessment of these 
aspects by examining the outbound links from Wikipedia articles to articles in 
scientific journals with a comparison against journal statistics from Journal Citation 
Reports such as impact factors. The results show an increasing use of structured 
citation markup and good agreement with the citation pattern seen in the scientific 
literature though with a slight tendency to cite articles in high-impact journals such 
as Nature and Science. These results increase confidence in Wikipedia as an good 
information organizer for science in general. 

Wikipedia increases in popularity and will probably get further importance for orga- 
nization and dissemination of scientific research. But how can the articles of this freely 
edited Internet-based encyclopaedia be trusted? 

Inbound links can to some extent quantify the quality of a work, and examples in- 
clude Google's PageRank for web-pages and the impact factor of scientific journals. The 
algorithms behind the PageRank and Kleinberg's HITS- can be adapted to Wikipedia-, 
but it is not clear whether high-scoring articles are also quality articles with respect to 
content. It has been suggested^ii that Wikipedia content surviving over a long period 
and many edits may be deemed of high quality. On the other hand studies have found 
that highly edited articles are likely quality articles^. Other proposals for quality assess- 
ment use revision history to compute a trust index for an article or an author reputation 
index-'^°. Another feature of an article that may correlate with article quality is the 
amount of outbound citation to "trusted" material, e.g., scientific articles. How prolific 
are these and does Wikipedia use them across scientific fields? Critics have noted that 
Wikipedia may be biased on the corpus level — leaned towards topics that interest the 
"young and Internet-savvy" — and a possible lack of sources has been noted-. 

Authors can include scientific references in Wikipedia by different means, most simply, 
by listing them at the bottom of the article. A more structured approach uses the 
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Figure 1: Correlations between citations to a journal from Wikipedia and from scientific 
journals. Kendall's rank correlation (a) and its associated P-value (b) as a function of the 
number of journals included in the test, e.g., the value at 80 shows the correlation between 
Wikipedia citations and JCR numbers for the 80 most cited journals from Wikipedia. 
The number of citations from Wikipedia is compared with three series of numbers from 
JCR and one derived: The total citations to a journal, its impact factors, the number of 
articles and the product of the total citations and impact factor. 

<ref > construct and the cite journal template which allow for inline referencing and 
consistent formatting. A user of the cite journal template needs to fill out the appropriate 
bibliographic fields of the template, e.g., the fields for the article title and the name of the 
journal. The structured citation markup makes it relatively easy to extract bibliographic 
information and ask: How well do the outgoing scientific citations in Wikipedia compare 
with the citations seen between scientific journals? 

To answer this question programs with regular expression matching written in the 
Perl language extracted the journal titles from the cite journal templates in all pages of 
the English Wikipedia obtained as the XML database dump file. A small list was setup 
to match the different variations of journal titles, and then the total number of citations 
was counted for each individual journal. The Journal Citation Reports (JCR) for 2005 of 
Thomson Scientific provided statistics on citations between scientific journals. 

The regular expression matched 30368 outbound citations from the cite journal tem- 
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Figure 2: Comparison between citations from scientific journals and from Wikipedia. 
Scatter plot with each dot representing the target journal receiving the citations, and 
with one axis representing the number of citations from Wikipedia and the other the 
product of two numbers: JCR total citations and impact factor. It indicates the 100 
most Wikipedia referenced articles. The plot shows not all journal titles. 



plate with the database dump for 2 April 2007. The summary statistics for the individual 
journals with the largest number of inbound citations from Wikipedia showed Nature 
(787), Science (669) and New England Journal of Medicine (NEJM) (446) on the top 
(number of citations in parenthesis). A number of astronomy journals received many 
citations: The Astrophysical Journal (424), Astronomy & Astrophysics (154), Icarus, In- 
ternational Journal of Solar System Studies (147) and The Astronomical Journal (93). 
Apart from NEJM other medical journals high on the list included The Lancet (268), 
JAMA (217), British Medical Journal (187) and Annals of Internal Medicine (104). Some 
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newspapers and non- scientific journals also received citations via the cite journal template 
with, e.g., The New York Times (69) among the most referenced. These non-scientific 
entries as well as journals such as Scientific American and Physical Review (that as a 
"multivolume" journal may be referenced in several ways) were excluded and the rest of 
the values were correlated against numbers obtained from JCR (Fig. 1). The Wikipedia 
citation numbers showed high correlation with the JCR's numbers for the total number 
of citations to a journal. Wikipedia citation numbers correlated less with JCR impact 
factor and the JCR's measure of numbers of articles in a journal. With 47.4 Annual 
Review of Immunology has the highest impact, but because it publishes few articles it re- 
ceives relatively few citations both from scientific journals and from Wikipedia (18). The 
correlations depended on the number of journals included in the test, with the largest 
correlation observed for the highly cited journals. It may simply reflect that journals 
with a small number of citations make noisy and poor statistics. In most cases the high- 
est correlation could be obtained by multiplying the total number of citation with the 
impact factor, i.e., Wikipedia authors slightly overcite high- impact journals compared to 
JCR numbers. The high correlation among top-cited journals with this combined number 
means that the 10 journals with the highest value of this measure feature among the 19 
most Wikipedia-referenced journals. 

When individual journals are examined Wikipedia citations to astronomy journals 
stand out compared to the overall trend (Fig. 2). Also Australian botany journals re- 
ceived a considerable number of citations, e.g., Nuytsia (101), in part due to concerted 
effort for the genus Banksia, where several Wikipedia articles for Banksia species have 
reached "featured article" status. Computer and Internet-related journals do not get 
as many as one would expect if Wikipedia showed bias towards fields for the "Internet- 
sawy". Communications of the ACM (34) became the most referenced. Of the medical 
journals BMJ received relatively many Wikipedia citations. Authors cite more often 
freely available articles^, and this may be particularly true for authors of the free en- 
cyclopaedia. Since BMJ's research articles are free the journal may gain extra citations 
from this effect. 

Citing Wikipedia as an authoritative source may be questionable with the present 
state of review on Wikipedia, and some universities have even banned citations to Wiki- 
pedia^. But when citations to trusted material support statements Wikipedia may be 
valuable for background reading. The present number of structured outbound citations 
from Wikipedia dwarfs in relation to the total number of scientific citations in the en- 
tire scientific literature. With this low number dedicated enthusiasts can influence the 
statistics making relatively few edits, cf. Australian botany. However, the use of the 
cite journal template has grown from zero in February 2005 when first introduced, to 
19066 in November 2006, 24656 in February 2007, to a total of 30368 citations in April 
2007. Reference management software (Zotero) now includes functionality for handling 
Wikipedia citations. Thus use of structured scientific citations in Wikipedia will very 
likely continue to grow and increasingly benefit researchers that look for well-organized 
pointers to original research. 
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